HOTFIX: 127.0.0.1 not localhost in verifier healthcheck (B02 Phase 2 follow-up) by pulkitpareek18 · Pull Request #36 · zeroauth-dev/ZeroAuth

pulkitpareek18 · 2026-05-15T07:26:38Z

Hotfix for PR #35. Production was 502 for ~3 minutes between 07:21 and 07:25 UTC because the new verifier container failed its healthcheck → `zeroauth-prod` (which depends on it via `condition: service_healthy`) never started.

Root cause

Alpine ships busybox `wget`. Busybox `wget` resolves `localhost` to `::1` (IPv6) first and does NOT fall back to `127.0.0.1` (IPv4) on refusal. The verifier binds `0.0.0.0` which is IPv4-only.

```text
Connecting to localhost:3001 ([::1]:3001)
wget: can't connect to remote host: Connection refused
```

The verifier was running and serving HTTP perfectly the whole time. Only the healthcheck command was wrong.

Manual recovery already done

I SSH'd to the VPS at 07:25 UTC and ran:

```bash
cd /opt/zeroauth && docker compose --profile prod up -d --no-deps zeroauth-prod
```

That started `zeroauth-prod` without waiting for the `service_healthy` dependency. Production has been serving traffic normally since. The API actually IS hitting the verifier (its `VERIFIER_URL=http://zeroauth-verifier:3001\` env was preserved) — the verifier service itself works fine, it's only Docker's healthcheck status that's wrong.

So this PR is a "correctness restoration" not an emergency — the next `docker compose up -d --build --remove-orphans` (e.g. next deploy) would re-introduce the same hang on the dependency-wait without this fix.

What changed

Two-line fix in two places:

`Dockerfile` verifier-production stage HEALTHCHECK: `localhost` → `127.0.0.1`
`docker-compose.yml` zeroauth-verifier healthcheck: same.

Both carry a comment explaining why `localhost` is wrong, so the next operator doesn't revert.

Verified

```bash
ssh root@104.207.143.14 \
'docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health'

{"status":"ok","version":"0.1.0","vkeyAvailable":true,"uptimeSeconds":202}

```

Test plan

Local: `docker build --target verifier-production` builds clean
On VPS: `wget http://127.0.0.1:3001/health\` from inside the verifier container returns 200 + valid JSON
CI green on this PR
After merge: deploy completes, BOTH containers healthy (not just zeroauth-prod), and the dependency edge re-activates cleanly

🤖 Generated with Claude Code

The deploy after PR #35 succeeded in building both containers but the verifier never became 'healthy' from Docker's perspective: Connecting to localhost:3001 ([::1]:3001) wget: can't connect to remote host: Connection refused Root cause: alpine ships busybox wget. Busybox wget resolves `localhost` to ::1 (IPv6) first and does NOT fall back to 127.0.0.1 (IPv4) on refusal. The verifier binds 0.0.0.0 (IPv4-only). Connection refused on every healthcheck, container marked unhealthy after 3 retries, zeroauth-prod (which depends on it via depends_on: service_healthy) never started. Result: prod was 502 for ~3 minutes between 07:21 UTC and 07:25 UTC until I manually started zeroauth-prod with --no-deps via SSH. That restored service. The verifier was running and responding to requests fine the whole time — only the healthcheck command was wrong. Fix: use the literal 127.0.0.1 in both the Dockerfile HEALTHCHECK and the compose-level healthcheck. The two are redundant by design: compose-level wins for `docker compose` orchestration; Dockerfile HEALTHCHECK wins for `docker run` outside compose. Both need to be correct. Comment added in both places explaining why localhost is wrong, so the next operator doesn't revert. Production state right now: zeroauth-prod is up + healthy via the manual --no-deps recovery. The verifier is up + responding but marked unhealthy by Docker (cosmetic — it doesn't block anything since prod is now running without the dependency wait). After this hotfix deploys, both will be healthy and the dependency edge reactivates on next restart. Verified locally: docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health → {"status":"ok","version":"0.1.0","vkeyAvailable":true,...} Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) The deploy after PR #35 succeeded in building both containers but the verifier never became 'healthy' from Docker's perspective: Connecting to localhost:3001 ([::1]:3001) wget: can't connect to remote host: Connection refused Root cause: alpine ships busybox wget. Busybox wget resolves `localhost` to ::1 (IPv6) first and does NOT fall back to 127.0.0.1 (IPv4) on refusal. The verifier binds 0.0.0.0 (IPv4-only). Connection refused on every healthcheck, container marked unhealthy after 3 retries, zeroauth-prod (which depends on it via depends_on: service_healthy) never started. Result: prod was 502 for ~3 minutes between 07:21 UTC and 07:25 UTC until I manually started zeroauth-prod with --no-deps via SSH. That restored service. The verifier was running and responding to requests fine the whole time — only the healthcheck command was wrong. Fix: use the literal 127.0.0.1 in both the Dockerfile HEALTHCHECK and the compose-level healthcheck. The two are redundant by design: compose-level wins for `docker compose` orchestration; Dockerfile HEALTHCHECK wins for `docker run` outside compose. Both need to be correct. Comment added in both places explaining why localhost is wrong, so the next operator doesn't revert. Production state right now: zeroauth-prod is up + healthy via the manual --no-deps recovery. The verifier is up + responding but marked unhealthy by Docker (cosmetic — it doesn't block anything since prod is now running without the dependency wait). After this hotfix deploys, both will be healthy and the dependency edge reactivates on next restart. Verified locally: docker exec zeroauth-verifier wget -qO- http://127.0.0.1:3001/health → {"status":"ok","version":"0.1.0","vkeyAvailable":true,...}

Task 4 of today. Formally records the decision Pulkit made yesterday when he picked Plan B over Plan A. Captures the three reasons single-engineer velocity beat the brainstorm's Rust spec, what we gave up (reproducible-build provenance, smaller transitive surface, unsafe-discipline) and what we kept (cross-repo HTTP shape stays Rust-compatible if we ever swap). Also pins the inline-fallback retirement plan: - 2026-05-15: verifier shipped, inline path unused but compiled-in - 2026-05-16 → 2026-06-06: 3-week soak in prod - 2026-06-08: PR to delete verifyInline + snarkjs from root deps + refuse-to-start when VERIFIER_URL is unset - 2026-06-09: prod runs verifier-only References the three shipping PRs (#35 cutover, #36 healthcheck hotfix, #37 SQLite audit log) + the plan-mode design doc + the B02 build prompt that we rejected.

First issue of the BFSI v1 compliance roadmap, owned by Agent #36 (Chief Compliance Officer). Covers the four certification tracks that gate the 12-month plan: DPDP Act 2023, the four binding RBI Master Directions (IT Governance, Digital Lending, Digital Payment Security Controls, KYC), SOC 2 Type I + Type II, and ISO/IEC 27001:2022. The RBI Sandbox application is tracked alongside as a Q3 deliverable. Eight sections per the agent-36 W1-Mon ticket: 1. Scope (in/out + India primary, GCC/UK secondary v2 lookahead). 2. Frameworks tracked with auditor + counsel relationships. 3. Q1-Q4 milestones aligned to the phase map in docs/plan/bfsi-v1/00-README.md. 4. Per-quarter deliverables table (D-Qn-NN IDs, owner agent, target week, dependencies) covering the year end-to-end. 5. Audit calendar weeks 1-52 listing every external interaction. 6. Vendor + counsel calendar (DPDP counsel, external cryptographer, SOC 2 auditor, ISO lead auditor, smart-contract audit firm, RBI counsel, bug bounty platform, evidence collector tool). 7. Open dependencies + risks (R-COMP-01..08) with owner + mitigation for each. Explicitly captures the three risks called out in the ticket: DPDP rule notification mid-evidence, evidence-collector tool slip, trusted-setup ceremony slip blocking ISO certification. 8. Document hygiene rules: quarterly retros in docs/compliance/retros/, regulator interaction log in docs/compliance/regulator-log.md, evidence pack rotation each quarter. Cross-references docs/plan/bfsi-v1/06-ways-of-working.md for the escalation path and docs/threat_model.md for the attack catalogue that control narratives map to. Calls out the trusted-setup ceremony artefact at docs/cryptography/trusted-setup-ceremony.md as the input to ISO Annex A.5.31 and SOC 2 CC6.1 evidence. [no-test] markdown-only deliverable per ticket. Reviewer: Agent #1.

First issue of the enterprise risk register at docs/compliance/risk/enterprise-risk-register-v1.md. Captures the 10 baseline commercial, operational, regulatory, strategic, security, and financial risks that the founder, CCO, CRO, and Risk & Audit lead carry on their dashboards. Distinct from docs/threat_model.md, which holds the technical attack catalogue (A-NN rows). Each enterprise risk references the threat-model rows it relates to so the two documents stay bidirectionally linked per the §6.5 operating principle. Document deliverable A40-W1-Mon from docs/plan/bfsi-v1/agents/agent-40-risk-audit.md. Pairs with the compliance roadmap at docs/compliance/compliance-roadmap-v1.md whose §7 holds the thinner compliance-bearing subset; this register is the authoritative copy. References docs/threat_model.md throughout (A-02, A-07, A-09, A-10, A-13, A-17, A-21, A-22, A-28) and docs/cryptography/trusted-setup-ceremony.md (R-ENT-04, R-ENT-07) and docs/compliance/privacy/data-inventory-v1.md (R-ENT-03 scoping). Risks classified by likelihood (1..5) x impact (1..5) with appetite bands accept <= 6, review 7-12, reject >= 13. At v1 all residuals sit in the auto-accept band after mitigation. Cadence is weekly walk by Agent #40, monthly review with Agent #1 + #36 + #42 on the 15th, quarterly board review in the last week of each Q, plus event-driven triggers per §6.3. Sign-offs in §7. [no-test] markdown-only documentation deliverable. Next review 2026-06-01 per A40-W2-Mon ticket which updates the register with commit hashes for closed mitigations.

First issue of the BFSI v1 compliance roadmap, owned by Agent #36 (Chief Compliance Officer). Covers the four certification tracks that gate the 12-month plan: DPDP Act 2023, the four binding RBI Master Directions (IT Governance, Digital Lending, Digital Payment Security Controls, KYC), SOC 2 Type I + Type II, and ISO/IEC 27001:2022. The RBI Sandbox application is tracked alongside as a Q3 deliverable. Eight sections per the agent-36 W1-Mon ticket: 1. Scope (in/out + India primary, GCC/UK secondary v2 lookahead). 2. Frameworks tracked with auditor + counsel relationships. 3. Q1-Q4 milestones aligned to the phase map in docs/plan/bfsi-v1/00-README.md. 4. Per-quarter deliverables table (D-Qn-NN IDs, owner agent, target week, dependencies) covering the year end-to-end. 5. Audit calendar weeks 1-52 listing every external interaction. 6. Vendor + counsel calendar (DPDP counsel, external cryptographer, SOC 2 auditor, ISO lead auditor, smart-contract audit firm, RBI counsel, bug bounty platform, evidence collector tool). 7. Open dependencies + risks (R-COMP-01..08) with owner + mitigation for each. Explicitly captures the three risks called out in the ticket: DPDP rule notification mid-evidence, evidence-collector tool slip, trusted-setup ceremony slip blocking ISO certification. 8. Document hygiene rules: quarterly retros in docs/compliance/retros/, regulator interaction log in docs/compliance/regulator-log.md, evidence pack rotation each quarter. Cross-references docs/plan/bfsi-v1/06-ways-of-working.md for the escalation path and docs/threat_model.md for the attack catalogue that control narratives map to. Calls out the trusted-setup ceremony artefact at docs/cryptography/trusted-setup-ceremony.md as the input to ISO Annex A.5.31 and SOC 2 CC6.1 evidence. [no-test] markdown-only deliverable per ticket. Reviewer: Agent #1.

First issue of the enterprise risk register at docs/compliance/risk/enterprise-risk-register-v1.md. Captures the 10 baseline commercial, operational, regulatory, strategic, security, and financial risks that the founder, CCO, CRO, and Risk & Audit lead carry on their dashboards. Distinct from docs/threat_model.md, which holds the technical attack catalogue (A-NN rows). Each enterprise risk references the threat-model rows it relates to so the two documents stay bidirectionally linked per the §6.5 operating principle. Document deliverable A40-W1-Mon from docs/plan/bfsi-v1/agents/agent-40-risk-audit.md. Pairs with the compliance roadmap at docs/compliance/compliance-roadmap-v1.md whose §7 holds the thinner compliance-bearing subset; this register is the authoritative copy. References docs/threat_model.md throughout (A-02, A-07, A-09, A-10, A-13, A-17, A-21, A-22, A-28) and docs/cryptography/trusted-setup-ceremony.md (R-ENT-04, R-ENT-07) and docs/compliance/privacy/data-inventory-v1.md (R-ENT-03 scoping). Risks classified by likelihood (1..5) x impact (1..5) with appetite bands accept <= 6, review 7-12, reject >= 13. At v1 all residuals sit in the auto-accept band after mitigation. Cadence is weekly walk by Agent #40, monthly review with Agent #1 + #36 + #42 on the 15th, quarterly board review in the last week of each Q, plus event-driven triggers per §6.3. Sign-offs in §7. [no-test] markdown-only documentation deliverable. Next review 2026-06-01 per A40-W2-Mon ticket which updates the register with commit hashes for closed mitigations.

Copilot AI review requested due to automatic review settings May 15, 2026 07:26

Copilot started reviewing on behalf of pulkitpareek18 May 15, 2026 07:26 View session

pulkitpareek18 merged commit 0a0bae3 into main May 15, 2026
1 of 2 checks passed

pulkitpareek18 deleted the hotfix-verifier-healthcheck branch May 15, 2026 07:26

pulkitpareek18 review requested due to automatic review settings May 15, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HOTFIX: 127.0.0.1 not localhost in verifier healthcheck (B02 Phase 2 follow-up)#36

HOTFIX: 127.0.0.1 not localhost in verifier healthcheck (B02 Phase 2 follow-up)#36
pulkitpareek18 merged 1 commit into
mainfrom
hotfix-verifier-healthcheck

pulkitpareek18 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pulkitpareek18 commented May 15, 2026

Root cause

Manual recovery already done

What changed

Verified

{"status":"ok","version":"0.1.0","vkeyAvailable":true,"uptimeSeconds":202}

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant